2
Introduction
TABLE 1.1
Results reported in BinaryConnect [48] and BinaryNet [99].
Method
MNIST
CIFAR-10
BinaryConnect (only binary weights)
1.29±0.08%
9.90%
BinaryNet (binary both weights and activations)
1.40%
10.15%
1.1
Principal Methods
This section will review binary and 1-bit neural networks and highlight their similarities
and differences.
1.1.1
Early Binary Neural Networks
BinaryConnect [48] was the first work presented that tried to restrict weights to +1 or
−1 during propagation but did not binarize the inputs. Binary operations are simple and
readily understandable. One way to binarize CNNs is by using a sign function:
ωb =
+1,
if
ω ≥0
−1,
otherwise ,
(1.1)
where ωb is the binarized weight and ω the real-valued weight. A second way is to binarize
scholastically:
ωb =
+1,
with
probability
p = σ(ω)
−1,
with
probability
1 −p
,
(1.2)
where σ is the “hard sigmoid” function. The training process for these networks is slightly
different from full-precision neural networks. The forward propagation utilizes the binarized
weights instead of the full-precision weights, but the backward propagation is the same as
conventional methods. The gradient ∂C
∂ωb needs to be calculated (C is the cost function) and
then combined with the learning rate to update the full-precision weights directly.
BinaryConnect only binarizes the weights, while BinaryNet [99] quantizes both the
weights and activations. BinaryNet also introduces two ways to constrain weights and ac-
tivations to be either +1 or −1, like BinaryConnect. BinaryNet also makes several changes
to adapt to binary activations. The first is shift-based Batch Normalization (SBN), which
avoids additional multiplications. The second is shift-based AdaMax instead of the ADAM
learning rule, which also decreases the number of multiplications. The one-third change is
to the operation to the input of the first layer. BinaryNet handles continuous-valued inputs
of the first layer as fixed-point numbers, with m bits of precision. Training neural networks
with extremely low-bit weights and activations were proposed as QNN [100]. As we are pri-
marily reviewing work on binary networks, the details of QNN are omitted here. The error
rates of these networks on representative datasets are shown in Table 1.1. However, these
two networks perform unsatisfactorily on larger datasets since weights constrained to +1
and −1 cannot be learned effectively. New methods for training [BNNs] and 1-bit networks
need to be raised.
Wang et al. [234] proposed Binarized Deep Neural Networks (BDNNs) for image clas-
sification tasks, where all the values and operations in the network are binarized. While
BinaryNet deals with CNNs, BDNNs target basic artificial neural networks consisting of
full-connection layers. Bitwise neural networks [117] also present a completely bitwise net-
work where all participating variables are bipolar binaries.